Java Extension Optimizations#835
Conversation
| private static final int DEFAULT_CAPACITY = 1024; | ||
|
|
||
| private int totalLength; | ||
| private byte[][] segments = new byte[21][]; |
There was a problem hiding this comment.
Why 21? The minimum segment size is 1024 for the first segment. The code doubles the segment size for each additional segment. Based on this doubling, we only need 21 segments before we hit Integer.MAX_VALUE.
There was a problem hiding this comment.
Makes sense. 👏
Maybe a comment or well-named constant so nobody else asks that question in the future?
|
Synthetic benchmarks of encoding an array of 128-byte ASCII strings. SegmetedByteListDirectOutputStream + SWARByteListDirectOutputStream + Scalar (effectively the same code as master)SegmentedByteListDirectOutputStream + ScalarByteListDirectOutputStream + SWARMaster |
|
|
||
| if (pos + 4 <= len) { | ||
| int x = bb.getInt(ptr + pos); | ||
| int is_ascii = 0x808080 & ~x; |
There was a problem hiding this comment.
This hex number only checks 3 bytes.
Maybe 0x808080 → 0x80808080
There was a problem hiding this comment.
Great catch, thank you! Late night coding without my glasses...
Interestingly no spec failed. I'll try to address that.
|
As of commit Benchmarks as of this commitSWAR + SegmentedByteListDirectOutputStreamSWAR + ByteListDirectOutputStreamNote: This did seem like a particularly good run, at least for the |
|
I'm happy to disable the |
|
Benchmarks from an Macbook Pro M4. I ran these a bunch of times and the results do vary a bit each run but I grabbed a random sampling. The big surprise is the Note, while I don't have the benchmarks here, if I do run the SegmentedByteListDirectOutputStream + SWARByteListDirectOutputStream + SWARSegmentedByteListDirectOutputStream + ScalarByteListDirectOuptutStream + Scalar |
|
@samyron Great results! I think we could go ahead with this any time, pending my couple of minor review comments that should be addressed. The segmented stream is consistently faster than the old logic, and coupled with SWAR it can be much faster. I'd like to see this land so we can get back to playing with the vector API. |
@headius I'm happy to address the comments but I don't see any review comments on this PR... |
headius
left a comment
There was a problem hiding this comment.
Only minor changes needed
|
|
||
| protected final byte[] escapeTable; | ||
|
|
||
| private static final String USE_SWAR_BASIC_ENCODER_PROP = "json.useSWARBasicEncoder"; |
There was a problem hiding this comment.
Let's prefix this with jruby. like other properties in JRuby and other libs.
| private static final int DEFAULT_CAPACITY = 1024; | ||
|
|
||
| private int totalLength; | ||
| private byte[][] segments = new byte[21][]; |
There was a problem hiding this comment.
Makes sense. 👏
Maybe a comment or well-named constant so nobody else asks that question in the future?
|
@samyron D'oh, I had started a review but never submitted it. Just a couple of minor changes and we can merge. |
|
Ship it! |
…er implementation.
…SegmentedByteListDirectOutputStream.
…ing the output buffer.
…byte in that chunk that needs escaping.
182105a to
43a8a83
Compare
Changelog 📓
OutputStreamto reduceSystem.arraycopy's each time the output buffer is resized.StringEncoder#encodeto include a SWAR-based fast path for basic JSON encoding. The algorithm is from this post. It's the same as the vector-based algorithm in the C extension.These features can be toggled with the system properties
json.useSegmentedOutputStreamandjson.useSWARBasicEncoder. Both default totrue. I'm happy to remove these. They made testing and benchmarking much easier.Benchmarks
SegmentedByteListDirectOutputStream + SWAR
ByteListDirectOutputStream + SWAR
ByteListDirectOutputStream + Scalar
SegmentedByteListDirectOutputStream + Scalar
master (as of commit 37e6890)